PUMA: Performance Unchanged Model Augmentation for Training Data Removal

نویسندگان

چکیده

Preserving the performance of a trained model while removing unique characteristics marked training data points is challenging. Recent research usually suggests retraining from scratch with remaining or refining by reverting optimization on points. Unfortunately, aside their computational inefficiency, those approaches inevitably hurt resulting model's generalization ability since they remove not only but also discard shared (and possibly contributive) information. To address degradation problem, this paper presents novel approach called Performance Unchanged Model Augmentation (PUMA). The proposed PUMA framework explicitly models influence each point respect to various criteria. It then complements negative impact reweighting optimally. demonstrate effectiveness framework, we compared it multiple state-of-the-art removal techniques in experiments, where show can effectively and efficiently without that 1) fool membership attack, 2) resist degradation. In addition, as estimates importance during its operation, could serve debug mislabelled more than existing approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training Data Augmentation for Low-Resource Morphological Inflection

This work describes the UoE-LMU submission for the CoNLL-SIGMORPHON 2017 Shared Task on Universal Morphological Reinflection, Subtask 1: given a lemma and target morphological tags, generate the target inflected form. We evaluate several ways to improve performance in the 1000-example setting: three methods to augment the training data with identical input-output pairs (i.e., autoencoding), a h...

متن کامل

Data Augmentation for Training of Noise Robust Acoustic Models

In this paper we analyse ways to improve the acoustic models based on deep neural networks with the help of data augmentation. These models are used for speech recognition in a priori unknown possibly noisy acoustic environment (with the presence of office or home noise, street noise, babble, etc.) and may deal with both the headset and distant microphone recordings. We compare acoustic models ...

متن کامل

Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions

This research extends our earlier work on using machine translation (MT) and word-based recurrent neural networks to augment language model training data for keyword search in conversational Cantonese speech. MT-based data augmentation is applied to two language pairs: English-Lithuanian and English-Amharic. Using filtered N-best MT hypotheses for language modeling is found to perform better th...

متن کامل

Data augmentation and language model adaptation

A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram Language Model (LM). This method is based on numerical distances in a reduced space obtained by Singular Value Decomposition (SVD). Rescoring word lattices in a spoken dialogue application using an LM containing augmented counts has lead to a Word Error Rate (WER) reduction of 6.5%. By further interpol...

متن کامل

Data augmentation for diffusions

The problem of formal likelihood-based (either classical or Bayesian) inference for discretely observed multi-dimensional diffusions is particularly challenging. In principle this involves data-augmentation of the observation data to give representations of the entire diffusion trajectory. Most currently proposed methodology splits broadly into two classes: either through the discretisation of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i8.20846